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(54) Method for coding the random component vector in an ACELP coder 



(57) An ACELP speech coding method according to 
G.729. When coding a random component vector, each 
of random component vector forming together the ran- 
dom codebook is formed of three or less pulses having 
a unit amplitude for each of a pair of subframes which 



form together a frame. The positions of the pulses are 
determined from a plurality of predetermined positions 
which a pulse can assume in a subframe so that a dis- 
tortion in minimized. 
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Description 

BACKGROUND OF THE INVENTION 

The invention relates to a method of speech coding 
which is arranged in the same manner as the ITU Inter- 
national Standard 8 kbit/s speech coding scheme CS- 
ACRELP (G.729) and which is employed to provide a 
speech coding at a lower bit rate. 

Various efficient coding schemes are attempted in 
the field of digital mobile communications for an efficient 
utilization of radio waves. Known schemes for speech 
coding at information rate on the order of 8 kbit/s include 
CELP (code excited linear prediction), VSELP (vector 
sum excited linear prediction), CS-ACELP and the like. 

For details of these coding schemes, refer "Code- 
Excited Linear Prediction (CELP): High Quality Speech 
at a Very Low Rates" by M.R. Schroeder and B.S. Atal 
in Proc. ICASSP 1 85, 25.1.1, pp 937-940, 1985 (litera- 
ture 1), "Vector Sum Excited Linear Prediction (VSELP) 
Speech Coding at 8 kps" by I .A. Gerson and M.A. Jas- 
iukin Proc. ICASSP* 90, S9.3, pp 461-464, 1990 (litera- 
ture 2), and "ITU-T 8 kbit/s Standard Speech Codec for 
Personal Communication Services" by A. Kataoka et al 
in Int. Conf. On Universal Personal Communication, pp 
818-822, 1995 (literature 3). For details of 8 kbit/s Inter- 
national Standard G.729 (CS-ACELP), refer ITU-T Rec- 
ommendation: G.729 Coding of speech at 8 kbit/s using 
conjugate-structure algebraic code excited linear pre- 
diction (CS-ACELP), COM 15-152-E, July 1995 (litera- 
ture 4). 

Fig. 1 shows an example of a coder used in such 
schemes, including an input terminal 1 1, an adder 12, a 
subtracter 13, a filter coefficient determination part 14, a 
filter coefficient quantizer 15, a synthesis filter 16, a per- 
ceptual weighting filter 17, a distortion power calculator 
18, a code output part 19, an adaptive codebook 21, a 
random codebook 22, a estimated gain part 23, a gain 
part 24, a gain estimation part 25, a codebook search 
part 26, a gain codebook 27 and an LSP codebook 28. 

Referring to Fig. 1, an input speech signal wave- 
form is applied to the input terminal 11, and a given 
number of samples (hereafter referred to as speech 
waveform vectors) are extracted from the sample train 
of the waveform every frame of 1 0 ms to be fed to the fil- 
ter coefficient determination part 14 where linear predic- 
tion coefficients (or LPC coefficients) are calculated. 
The LPC coefficients are converted into LSP coeffi- 
cients in the filter coefficient quantizer 1 5 where they 
are quantized by reference to the LSP codebook 28. 
The quantized LSP coefficients have their quantized 
codes Isp delivered and are also converted back to LPC 
coefficients to be set up in the synthesis filter 1 6 as filter 
coefficients. 

The adaptive codebook 21 stores exciting vectors 
over a plurality of past frames as pitch component vec- 
tors which adaptively change. A pitch component vector 
candidate P is chosen from the plurality of pitch compo- 
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nent vectors, and a random component vector candi- 
date C is chosen from a plurality of fixed random 
component vectors (or random number vectors) con- 
tained in the random codebook 22. Gains g p g N chosen 

s from the gain codebook 27 and forming a gain vector 
candidate g=(g P g N ) are applied to the candidates P, 
C in multipliers 24P, 24N, respectively, of the gain part 
24, and the resulting products are added together in the 
adder 12 to be fed to the synthesis filter 16 as exciting 

w vectors, thus synthesizing a speech. The gain estima- 
tion part 25 predicts from past random component vec- 
tors an approximate gain, which is then set up in the 
estimated gain part 23. 

A synthesized speech is subtracted from the input 
is speech waveform vector X, and a resulting error vector 
is perceptually weighted in the perceptual weighting fil- 
ter 1 7 to be fed subsequently to the distortion power cal- 
culator 1 8. The distortion power calculator 1 8 calculates 
the power of a perceptually weighted error (or distor- 
20 tion), and the codebook search part 26 is effective to 
select respective candidate vectors from the adaptive 
codebook 21 the random codebook 22 and the gain 
codebook 27 so that the power in the distortion is mini- 
mized. Code output part 19 delivers indices l p l N , l G . 
25 representing these selected vectors, together with code 
l sp which represents the quantized LSP coefficients as 
coded outputs. 

Fig. 2 shows an example of a decoder correspond- 
ing to the coder shown in Fig. 1, including an input ter- 
se minal 31, an adder 32, a filter coefficient decoder 33, a 
synthesis filter 34, an adaptive codebook 35, a random 
codebook 36, a estimated gain part 37, a gain part 38, 
a gain estimation part 39, and a gain codebook 41. in 
the arrangement of Fig. 2, the received code l sp is fed to 
35 the filter coefficient decoder 33 where LSP coefficients 
are decoded and then converted into LPC coefficients, 
which are in turn fed to the synthesis filter 34 to be used 
as fitter coefficients therein. The received code l G is 
decoded into gain vector (g p g N ) jn the gain codebook 
40 41 for use as gains g p g N in the multipliers 38P, 38N of 
the gain part 38. 

On the other hand, pitch component vector P and 
random component vector C are read out from the 
adaptive codebook 35 and the random codebook 36, 
45 respectively, in a manner corresponding to the received 
codes l P and l N . The pitch component vector P is multi- 
plied by the gain g P in the gain part 38 while the random 
component vector C is initially multiplied by the esti- 
mated gain from the gain estimation part 39 in the esti- 
so mated gain part 37 to be adaptively gain adjusted and is 
then multiplied by the gain g N in the gain part 38. The 
gain controlled pitch component vector and random 
component vector from the gain part 38 are synthesized 
in the adder 32 to be fed to the synthesis filter 34 as 
55 exciting vectors, whereby a decoded speech is deliv- 
ered. 

Fig. 3 shows a bit allocation for coding individual 
parameters used in G.729. In G.729, a frame length is 
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equal to 1 0 ms, using 80 bits per frame. Of these, 1 8 bits 
are allocated to coding LSP coefficients. The coding of 
LSP coefficients takes place by way of a vector quanti- 
zation in two stages as illustrated in Fig. 4. in the first 
stage vector quantization, a 1 0-th order vector quantiza- 5 
tion is effected using a first stage LSP codebook having 
128 candidates (7 bits). In the second stage, a 10-th bit 
vector quantization is effected using a pair of LSP code- 
books, a higher order and a lower order one, each hav- 
ing 32 candidates (5 bits) to enable a 5-th order vector >o 
quantization. One bit is allocated for selection of predic- 
tion coefficients. 

For coding a pitch component vector using the 
adaptive codebook 21 , the frame is divided into a first 5 
ms subframe and a second 5 ms subframe. 8 bits and 15 
one parity bit are allocated to the first subframe while 5 
bits are allocated to the second subframe. For coding a 
random component vector using the random codebook 
22, 17 bits, inclusive of 4 bits for the polarities of four 
pulses, are allocated to each subframe. 20 

Fig. 5 shows predetermined positions which the 
four pulses can assume when a random exciting pulse 
structure to be used in coding the random component 
vector with the random codebook according to G.729 is 
realized by using four pulses in each subframe. Specif t- 25 
cally, positions from No. 0 to No. 39 are defined in the 40 
ms subframe at a spacing of 1 ms. for example, and 
such 40 positions are allocated to pulses #0 to #3 as 
shown in the chart of Fig. 5 which conforms to G.729. 
As will be evident from the chart, eight positions are 30 
available for each of the pulses #0, #1 and #2 in tracks 
0, 1 and 2, and thus a position can be specified by three 
bits. For pulse #3, sixteen positions are available in two 
tracks 3 and 4. Thus the position can be specified by 
four bits. Hence, information representing the positions 35 
of the four pulses in each subframe can be given by 13 
bits, in addition to the 13 bits, the sign (polarity) of each 
of the four pulses is given by one bit, thus using a total 
of 1 7 bits for each entire subframe. 

For coding a gain vector with the gain codebook 27, 40 
7 bits are allocated to each subframe as indicated in 
Fig. 3, thus using a total of 14 bits. 

It is to be noted that when performing a communi- 
cation with Codec according to the ITU International 
Standard G.729, it is possible that a sufficient transmis- 45 
sion capacity may not be secured depending the condi- 
tion of a transmission path, presenting a problem that 
the communication may be disabled. While it may be 
contemplated to achieve the communication by using a 
coding scheme which requires a less transmission so 
capacity, this presents another problem that an entirely 
distinct coder and decoder combination is necessary 
Accordingly, it is desirable in such instance to reduce 
the bit rate of the signal without a significant degrada- 
tion in the speech quality while allowing a code structure 55 
similar to that of the International Standard G.729 to be 
retained. However, it has been unknown how it is possi- 
ble to reduce the bit allocation to a particular part of the 
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code structure effectively without accompanying a deg- 
radation in the speech quality. 

SUMMARY OF THE INVENTION 

It is an object of the invention to provide a speech 
coding method which permits a bit rate to be reduced 
without a significant degradation in the speech quality 
while conforming to the speech coding according to the 
International Standard G.729. 

In accordance with the invention, there is provided 
a speech coding method according to ACELP in which 
an LSP coefficient, a pitch component vector, a random 
component vector and gain vectors which are applied to 
the pitch component vector and the random component 
vector are coded using an LSP codebook, an adaptive 
codebook, a random codebook and a gain codebook, 
respectively, so that a distortion relative to an input 
speech waveform vector is minimized for each frame, 

comprising the step of coding the random com- 
ponent vector such that each of random component 
vectors forming together the random codebook is 
formed of three or less pulses having a unit amplitude 
for each of a pair of subframes which form together a 
frame, the position of the pulses being determined from 
a plurality of predetermined positions which a pulse can 
assume in a subframe so that a distortion in a synthe- 
sized speech is minimized. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 is a block diagram of a CELP coder according 
to the International Standard G.722 on which the 
invention is premised; 

Fig. 2 is a block diagram of a decoder, correspond- 
ing to the coder shown in Fig. 1 ; 
Fig. 3 is a chart showing a bit allocation for coding 
parameters according to G.729 in each frame; 
Fig. 4 is a chart showing a detail of a bit allocation 
for coding LSP coefficients shown in the chart of 
Fig. 3; 

Fig. 5 is a chart showing a specific example of a 
random codebook shown in the chart of Fig. 3; 
Fig. 6 is a chart showing an example of 1 1-bit ran- 
dom codebook according to the invention; 
Fig. 7 is a chart showing an example of 9-bit ran- 
dom codebook; 

Fig. 8 is a chart showing an example of 10-bit ran- 
dom codebook; 

Fig. 9 is a chart showing another example of 1 1 -bit 
random codebook; 

Fig. 10 is a chart showing a further example of 1 1 - 
bit random codebook; 

Fig. 11 is a chart showing a bit allocation for coding 
individual parameters when a single random code- 
book is employed; 

Fig. 12 is a chart showing a bit allocation for coding 
individual parameters when a conjugate structure 
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random codebook is employed; 
Fig. 13 is a chart showing a bit allocation for coding 
individual parameters when 9-bit random codebook 
is employed; 

Fig. 14 is a chart showing a bit allocation for coding s 
individual parameters when higher-order bits in the 
second stage of an LSP codebook are further 
reduced; 

Fig. 15 is a chart showing a bit allocation for coding 
individual parameters when lower-order bits in the 10 
LSP codebook are further reduced; and 
Fig. 16 is a chart showing a comparison of perform- 
ance according to a subjective evaluation between 
the speech coding method of the invention and 
another coding method. 75 

DETAILED DESPCRIPTION OF THE PREFERRED 
EMBODIMENTS 

It is initially to be noted that the speech coding 20 
method of the invention premises the use of a coder as 
shown in Fig. 1 which conforms to the standard G.729. 
In the International Standard G.729, the coding system 
as shown in Fig. 1 employs a frame length of 10 ms and 
80 bits per frame for purpose of coding. When the bit 25 
rate is changed to 6.4 kbit/s while maintaining the same 
frame size, the number of bits used for coding must be 
reduced to 64 bits per frame or must be reduced by 16 
bits per frame. It is then necessary to examine if an 
effective reduction can be achieved while maintaining 30 
any resulting degradation in the speech quality at an 
unnoticeable level by determining to which parameter 
the bit allocation may be reduced in the code structure 
for each frame as shown in Fig. 3 which is used in 
G.729, thus realizing an optimum code structure at 6.4 35 
kbit/s. However, because the 6.4 kbit/s coding operates 
as an extension of 8 kbit/s coding (G.729), a smooth 
switching between the both must be assured. In other 
words, it is required that a good quality be achieved at 
6.4 kbit/s and at the same time, it is also necessary to 40 
prevent a clearly extraneous sound from being sensed 
upon switching to 8 kbit/s. 

Example 1 : reduction of bits used in coding pitch com- 
ponent vector 45 

A pitch component vector has a great influence 
upon the decoded speech quality and accordingly no bit 
reduction is made to 13-bit pitch information in order to 
realize the high quality with the 6.4 kbit/s coding. In so 
G.729, the most significant 6 bits in the 8-bit pitch infor- 
mation in the first subframe are protected by one parity 
bit. Thus, if a bit error occurs in the course of a transmis- 
sion path, the error can be detected by the parity bit, 
and in such instance, the pitch period of the previous 55 
subframe is substituted for the pitch period of the cur- 
rent subframe. Since the parity bit is wasteful when no 
error is present, the parity bit is deleted. 
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Example 2: reduction of bits used in coding LSP coeffi- 
cients 

G. 729 employ an 18-bit LSP quantizer. The LSP 
quantizer comprises a two stage LSP codebook which 
employs a 4-th order interframe prediction (literature 4). 
A quantized LSP coefficient Cln of an n-th frame is given 
as follows: 

4 4 

"n-FoSn + E^Sn-l -X F I = I 0) 
i=1 1=0 



where Fj represents a diagonal matrix of prediction 
coefficients for interframe prediction, I unit matrix, and 
S n a second stage vector quantization output using the 
LSP codebook during n-th frame (or current frame). 

A quantization vector S n which is output from the 
LSP codebook is represented as a sum of a pair of 
codebooks as indicated below: 

S n = S lj + S 2j L forj=0 4 (2) 

= S 1j + S 2j H forj = 5, ... ,9 

where S^ is an output (7 bits) from the first stage LSP 
codebook, S 2 j L a low-order output (5 bits) from the sec- 
ond stage as indicated in the chart of Fig. 3, and S 2 j H a 
higher order output (5 bits) from the second stage. 

A search is made for a combination of Cl n and an 
input LSP coefficient flj n for which a distortion of d sp , 
which is defined as indicated below, 

d sp = (a in -n n ) T w n (Q jn -n n ) (3) 

is minimized. In this equation, W n represents a weight- 
ing coefficient obtained from the input LSP coefficient. 
Of these bits, the LSP codebook S^ in the first stage 
and the prediction coefficient Fj have a great influence 
upon the performance. The lower the order of the LSP 
coefficient, the greater the impact upon the .speech 
quality. 

To achieve the 6.4 kbit/s coding, a bit reduction is 
made from the second stage LSP codebook which is 
considered to have relatively less contribution to the 
performance. Since the second stage LSP codebook is 
used to quantize a component which remains when an 
output from the first stage LSP codebook is subtracted 
from the input LSP, the second stage LSP codebook 
assumes a random value. The LSP coefficient assumes 
a value in a range from 0 to n. 

Case (1): The bits in the second stage higher order 
LSP codebook S2j H is reduced from 5 bits to 4 bits, thus 
forming a codebook using 16 codes having an index 
number from 0 to 15. A 4-bit LSP codebook which is 
suitable for use in the 6.4 kbit/s coding may be chosen 
by selecting appropriate codes from a 5-bit LSP code- 



4 



BNSDOCID: <EP 0865027A2_I_> 



7 EP0 8 

book which is destined for use in the 8 kbit/s. Alterna- 
tively, codes having a sequential index number from 0 to 
15 may be chosen from codes in the 5-bit LSP code- 
book which have index numbers from 0 to 31 in a simple 
manner. 

It is to be understood that in the 8 kbit/s coding 
(G.729), the second stage LSP codebook is designed to 
provide an optimum result when 5 bits are used. It is 
then contemplated to provide a re-learning of the sec- 
ond stage codebook so that an optimum result is 
obtained when 4 bits are used. In this instance, it is nec- 
essary to provide a second stage higher order LSP 
codebook for use in the 6.4 kbit/s coding, in addition to 
the second stage higher order codebook for use in the 8 
kbit/s coding. An augmentation required for the memory 
to provide the new codebook is equal to 80 words (5-th 
order vector* 16 = 80). 

Case (2): Similarly, the bits in the second stage 
higher order LSP codebook may be reduced by two bits 
(thus changing from 5-bit codebook to 3-bit codebook). 
In a similar manner as mentioned above, part of the 
original codebook may be used. Alternatively, a second 
stage higher order LSP codebook having 3 bits and 
which provide an optimum result may be prepared by 
re-learning. 

Case (3): 1 bit may be reduced from the second 
stage higher order LSP codebook S 2 j H and also 1 bit 
may be reduced from the lower order LSP codebook 
S 2 j L (thus changing each from 5-bit to 4-bit codebook). 

In a similar manner as mentioned above in connec- 
tion with Case (2), it is possible to use part of the origi- 
nal LSP codebook, or alternatively, a higher order LSP 
codebook and a lower order LSP codebook each having 
4 bits may be provided which provides an optimum 
result by re-learning. Such choices may be used in com- 
bination. For example, the lower order codebook is sub- 
ject to re-learning while the higher order codebook 
comprises a part of the original codebook. 

Example 3: Reduction of a bit or bits from the random 
codebook 

As shown in the chart of Fig. 5 in G.729, the ran- 
dom component vector of each subframe is represented 
by 4 vectors and there are provided 8, 8, 8 and 16 posi- 
tions which the 4 pulses #0 to #3 can assume. These 
positions are indicated by using 13 bits, and one bit is 
used for the polarity of each pulse. In accordance with 
the invention, to provide a method of reducing a bit or 
bits most effectively while suppressing a degradation in 
the quality of decoded speech to an unnoticeable level, 
several cases will be described below for reducing a bit 
or bits which are allocated to coding random component 
vectors. 

Case (1): As shown in the chart of Fig. 6, a random 
component vector is represented in terms of two pulses 
#0 and #1 for each subframe. Sixteen positions are 
available for the pulse #0 and can be represented by 4 
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bits. 32 positions are available for the pulse #1 and can 
be represented by 5 bits. One polarity bit is allocated to 
each of the pulses #0 and #1 . In this manner, a total of 
(4+5+2=) 1 1 bits are allocated to each subframe. This 

5 allows the number of bits which are allocated to coding 
random component vector in one frame to be reduced 
from 34 bits for the arrangement of G.729 to 22 bits. 

A codebook for random component vectors accord- 
ing to the pulse structure shown in Fig. 6 includes 2 11 

io vectors, and a search for the pulse position is made in a 
manner such that a distortion of a speech which is pro- 
vided by the synthesis filter 16 by synthesizing random 
component vectors C as exciting vectors relative to an 
input speech waveform vector (target vector) X is mini- 

15 mized. Representing the impulse response matrix of the 
synthesis filter 16 by H, the distortion dr is given as fol- 
lows: 

|HC k r. C k a>C k 

where d represents a correlation vector between X T and 

25 H or d=H T X and <D a correlation matrix with H or 
O = H T H . d and <J> are previously calculated, and a cal- 
culation is made of dr = (d T C k ) 2 /C k T 0>C k for each 
vector candidate C k in order to select an exciting vector 
(random component vector) C k from the random code- 

30 book 22 which minimizes dr. Exciting vectors com- 
prise pulses having amplitudes of 0 or ± 1 . Accordingly, 
the calculation according to the equation (4) can take 
place by a multiplication of a sign and an addition, in the 
similar manner as indicated for G.729 in the literature 

35 (4). A shape codebook of such exciting vectors is called 
an algebraic codebook. 

During the search for a pulse position, an optimum 
solution can be found by calculating d T C k for all combi- 
nations of track 0 and tracks 1 , 2. However, to reduce 

40 the amount of calculation, it is also possible to employ a 
simplification such as initially determining the position of 
only the track 0. 

Case (2): A 9-bit random codebook shown in Fig. 7 
is used. As shown in Fig. 7, the exciting pulse structure 

45 comprises a pair of pulses in each subframe, which 
have opposite polarities, providing 16 available posi- 
tions for each pulse. Conversely, there are defined eight 
unavailable positions. Accordingly, each of the two 
pulse positions can be represented in terms of four bits, 

so and there is provided one bit which serves reversing the 
polarities of the two pulses simultaneously. In this man- 
ner, 9 bits are allocated to each subframe. Thus, by 
using a 9-bit random codebook, the number of bits can 
be reduced by as many as 8 bits per subframe or 1 6 bits 

55 per frame. The 9-bit random codebook comprises an 8- 
bit shape codebook together with one polarity bit. In this 
instance, it is possible to use a random signal directly as 
an exciting vector for the shape codebook or to produce 
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an exciting vector by learning process. 

Alternatively, the random codebook may be divided 
into a pair of sub-codebooks. Thus a conjugate-struc- 
ture codebook in which an exciting vector is represented 
as a sum of a pair of sub-vectors may be used. By way s 
of example, a combination of 3-bit shape codebook 
together with one sign bit or a combination of a 4-bit 
shape codebook together with one sign bit may be 
used. It is also possible to represent the exciting vector 
by a pulse having an amplitude of 1 in the similar man- 10 
ner as in G.729. 

Case (3): A 10-bit random codebook as shown in 
Fig. 8 is used. 

The 10-bit random codebook as shown in Fig. 8 
comprises random component vectors where each sub- is 
frame comprises a pair of pulses, in the similar manner 
as described above in connection with Fig. 7. However, 
in the instance of Fig. 8, one polarity bit is associated 
with each bit so that the polarity of each of the pair of 
pulses can be independently selected. By using this 20 
random codebook, the number of bits can be reduced 
by as many as 7 bits per subframe, or 1 4 bits per frame. 
The 10-bit random codebook comprises a 9-bit shape 
codebook together with one polarity bit associated with 
each pulse. In this instance, a random signal may be 25 
directly used as an exciting vector for the shape code- 
book or to produce an exciting vector by a leaning proc- 
ess. 

Alternatively, a conjugate-structure codebook may 
be used in which an exciting vector is represented as a 30 
sum of a pair of sub-vectors by dividing the random 
codebook into a pair of sub-codebooks. By way of 
example, it is possible to use a combination of 4-bit 
shape codebook together with one sign bit or a combi- 
nation of 4-bit shape codebook together with one sign 35 
bit. It is also possible to represent a exciting vector by a 
pulse having an amplitude of 1 in the similar manner as 
in G.729. 

Case (4): A 11 -bit random codebook as shown in 
Fig. 9 is used. 40 

In the example shown in Fig. 9, a subframe is con- 
structed with three pulses. Eight available positions are 
given to each of the pulses #1 and #0 while sixteen 
available positions are given to the pulse #2. Accord- 
ingly, a total of (3+3+4 =) 1 0 bits are allocated to define 45 
the position of the three pulses. The relative polarity of 
the three pulses is predetermined. For example, pulses 
iO and i1 are positive while pulse i2 is negative. There is 
also provided another bit which controls a simultaneous 
reversal of the polarity of these three pulses. By using so 
the 1 1 -bit random codebook, the number of bits can be 
reduced by as many as 6 bits per subframe or 12 bits 
per frame. The 11 -bit random codebook comprises a 
10-bit shape codebook together with one sign bit. In this 
instance, it is possible to use a random signal directly as 55 
an exciting vector for the shape codebook or to produce 
an exciting vector by a learning process. 

Alternatively, a conjugate-structure codebook in 



which an exciting vector is represented by a sum of a 
pair of sub-vectors may be used by dividing a random 
codebook into a pair of sub-codebooks. By way of 
example, a combination of a 5-bit shape codebook 
together with one sign bit or a combination of a 4-bit 
codebook together with one sign bit may be used. It is 
also possible to represent an exciting vector by a pulse 
having an amplitude of 1 in the similar manner as in G. 
729. 

The structure shown in Fig. 9 is not always limited 
to its use for three pulses, but may also be used selec- 
tively for two pulses or three pulses. Fig. 10 shows such 
a structure. Specifically, no pulse is placed at position 
38, and when 12 indicates 38, only pulses iO and il are 
used. When the pulse i1 indicates 37, only the pulses iO 
and i2 are used. In this instance, 38 is not used with a 
pulse i2. In addition, when a pulse iO indicates 35, only 
the pulses i1 and i2 are used. In this instance, the pulse 
i1 is not placed at 37. By conducting a search according 
to this rule, an optimum one can be searched among 
combinations of two pulses or three pulses. 

Example 4: Example of search among random code- 
book 

In order to improve the quality of the 6.4 kbit/s cod- 
ing, a conditional orthogonalization is introduced into 
the search of random exciting vector. During the CELP 
coding, when a search of the random codebook is 
made, a k-th random component vector C k from the ran- 
dom codebook 12 is applied as an exciting vector to the 
synthesis filter 16 (thus, choosing gains g P = 0, g N = 1), 
and an exciting vector (random component vectors) 
is selected which minimizes the distortion of an output 
synthesized speech HC* relative to the input speech 
vector (target vector) X, as given by the equation (4). 

When a random component vector is used for syn- 
thesis with a pitch component vector to code an input 
speech, it is known that the quality of synthesized 
speech can be enhanced by orthogonalizing an output 
from the synthesis filter 16 or by removing a component 
contained in the random component vector and which is 
parallel to the pitch component vector subsequent to the 
determination of the pitch component vector and during 
a search of an optimum random component vector from 
the random codebook in consideration of the deter- 
mined pitch component vector. 

A random exciting vector H A C k which is orthogonal- 
ized with respect to the pitch component vector P is 
given as follows: 



H*C k = HC k - 



(HC k ) T HP 



HHPO' 



HP 



(5) 



When an optimum gain for the exciting vector is deter- 
mined, the distortion dr between the target vector X and 
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the synthesized speech is represented as follows: 



dr = |Xr 



(X T H A C, 



QH*C k 0 



(6) 



Accordingly, to minimize the distortion, a search is 
made for a random component vector C k which maxi- 
mizes the second term on the right side of the equation 
(6): 



(X T H*C k ) : 
BH*C k [| 



(7) 



The numerator of-the equation (6) can be modified as 
follows: 

X T H A C k = X T HC k .where X = {X - X HP HP} (8) 

DHP(! 



This is equivalent to the target vector X as orthogonal- 
ized with respect to the excitation output HP by the pitch 
component vector P. The modification reduces the cal- 
culation to the calculation of the numerator in the equa- 
tion (4). 

On the other hand, the denominator of the equation 
(7) can be written as follows: 



(hp"*"hc }* 
HH A C k [| 2 = [|HC k || 2 HJ 



||HP|f 



(9) 



where 1/||HC k |f (=A) is a constant, and by putting 
(HP) T H=E , the equation (6) is reduced as follows: 



iiH A c k r = DHc k r 



A(E T C k ) 2 



(10) 



E T C k can be obtained from E by adding values at points 
corresponding to the pulse positions for the number of 
pulses. An augmentation in the amount of calculation 
which is caused by the orthogonal! zation remains to be 
only the component of A(E T Ck) 2 , which is very slight. 

When the random exciting vector has a high degree 
of freedom, the orthogonalization improves the speech 
quality. However, when an algebraic codebook as 
shown in Figs. 6 to 1 0 is used as the random codebook, 
there is a greater limitation on the pulse position in the 
random exciting vector even though the amount of cal- 
culation required for the search is reduced, and hence 
the quality is not always improved. For this reason, the 
search according to the equation (7) is effected only 
when an orthogonalized search is desirable, but other- 
wise the search according to equation (4) is effected. An 
optimum gain gp_ op t for the pitch is used as the condi- 



10 



25 



30 



35 



40 



tion to effect such a switching. An optimum pitch gain is 
described as follows: 



45 



SO 



55 



i P_opt 



X T HP 
«HP|] 2 



(11) 



When the pitch gain is high, the pitch component 
has a greater contribution, and accordingly, the orthog- 
onalization with respect to the pitch component vector is 
effective. Accordingly, only when the following condition: 



9p_o P i * 9t 



(12) 



is satisfied, the orthogonalized search is effected. The 
threshold g th may have a value such as 0.5, for exam- 
ple. Alternatively, a estimated gain for the pitch as given 
below: 



Pr = 20 log {flX|| 2 /flX - HPfl 2 } 



(13) 



may be used as the switching condition. In this equa- 
tion, X represents an input speech waveform vector and 
HP a pitch waveform vector. As mentioned previously, 
the orthogonalized search is effected only when the 
estimated gain for the pitch is high. 

Example 5: Reduction of bit or bits from gain codebook 

In G. 729, a gain codebook having 7 bits per sub- 
frame is used to quantize the pitch gain and the gain of 
the random exerting vector. Respective gains g p g N are 
each represented by a sum of a pair of sub-codebooks. 
When preparing the present codebook, a learning proc- 
ess is incorporated in consideration of a transmission 
path error. By incorporating the learning which takes a 
transmission error into consideration, the influence of 
the error can be reduced if an error in the bits of a gain 
code occurs in the course of transmission path. This 
can be achieved at a sacrifice of a degradation in the 
quality of reproduced speech under an error-free condi- 
tion as compared with the quality of speech reproduced 
using a codebook which is obtained without considera- 
tion of such a transmission error. 

In the embodiment described here, a 6-bit gain 
codebook is produced by reducing a bit or bits from the 
gain codebook employed in the G.729. In this case, 
since the gain codebook is reduced one bit, a repro- 
duced speech signal would be degraded in quality. In 
this embodiment according to the present invention, 
degradation in the reproduced speech quality can be 
suppressed as compared with the use of 7-bit code- 
book, by preparing the gain codebook with a bit error 
rate which is less than the bit error rate (= 0.5%) 
employed in the preparation of the gain codebook 
according to the G.729. The new codebook can also be 
formed as a single codebook for vector quantization in 6 
bits. Alternatively, it may be divided into a pair of 3-bit 
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codebooks as conjugate codebook in a similar manner 
as in G. 729. When the pair of codebooks are used, an 
augmentation required for the memory capacity by the 
use of the new gain codebook remains to be as small as 
32 words (8x2x2 = 32). 5 

Example 6: Example of 6.4 kbit/s coder 

As a result of above considerations, a coder is 
designed as described below. 10 

Case (1): A bit or bits are reduced only from the ran- 
dom codebook. 

By reducing a bit or bits only from the random code- 
book, 9-bit random codebook is used. Shown in the col- 
umn for the Coder A of Fig. 11 is an example of bit is 
allocation for coding individual parameters when a sin- 
gle 9-bit (8 bits for shape and one bit for polarity) ran- 
dom codebook is used. Shown in the column for Coder 
D of Fig. 12 is an example of bit allocation for coding 
individual parameters when a 9-bit ((4+3) bits for shape 20 
and (1+1) bits for polarity) conjugate-structure random 
codebook is used. Also shown in the column for Coder 
G of Fig. 13 is an example of bit allocation when a 9-brt 
(two pulses; four bits for each pulse position and one 
polarity bit for two pulses) random codebook is used. 25 

Case (2): Parity bits are reduced, and the higher 
bits in the second stage of LSP codebook is reduced by 
one bit to 4 bits, employing a 10-bit random codebook. 

Shown in the column for Coder B of Fig. 11 is an 
example of bit allocation when 10-bit (9 bits for shape 30 
and one polarity bit) single random codebook is used. 
Shown in the column for Coder E of Fig. 12 is an exam- 
ple of a bit allocation when a 1 0-bit ((4+4) bits for shape 
and (1+1) bits for polarity) conjugate-structure random 
codebook is used. Shown in the column for Coder H of 35 
Fig. 13 is an example of bit allocation when a 10-bit (two 
pulses; four bits for each pulse position and one bit each 
for the polarity of each pulse) random codebook is used. 

Case (3): Parity bits are reduced and higher order 
bits in the second stage of LSP codebook is reduced by 40 
one bit to 4 bits, and one bit is reduced from the gain 
codebook to 6 bits, using a 1 1-bit random codebook. 

Shown in the column for Coder C of Fig. 11 is an 
example of bit allocation when a 11 -bit ( 10 bits for 
shape and one polarity bit) single random codebook is 45 
used. Shown in the column for the Coder F of Fig. 12 is 
an example of bit allocation when a 1 1-bit ((4+5) bits for 
shape and (1+1) bits for the polarity) conjugate-struc- 
ture random codebook is used. Shown in the column for 
the Coder I of Fig. 13 is an example of a bit allocation so 
when a 1 1-bit (three pulses; (3+3+4) bits for respective 
pulse positions and one polarity bits for three pulses) 
random codebook is used. In this instance, the 2-3 
pulse type random codebook may be used as the 1 1-bit 
random codebook mentioned above. The gain code- 55 
book may comprise either 6-bit collective codebook or a 
(3+3) conjugate-structure codebook. 

Case (4): Instead of reducing the parity bits in the 



Cases (2) and (3), a further bit may be reduced from the 
higher order bits from the second stage of LSP code- 
book, thus reducing a total of two bits (Coder J, K of Fig. 
14). 

Case (5): Instead of reducing the parity bits in the 
Cases (2) and (3), one bit may be reduced from the 
lower order bits from the second stage 
of LSP codebook, thus reducing to the total of 4 bits 
(Coder L, M of Fig. 15). 

Case (6): In the Cases (1) to (5), a conventional 
search for the random exciting vector [a search accord- 
ing to the equation (4)] or an orthogonalized search with 
respect to the pitch waveform [a search according to the 
equation (7)] may be used. Alternatively, a switching 
between the both may be performed depending on a 
certain condition. 

Evaluation Experiment 

Using a subjective evaluation, the performance of a 
coding method has been evaluated in which the bit allo- 
cation for the coder corresponds to the Case (3) using a 
1 1 -bit algebraic random codebook of 2-3 pulse type with 
a switching of the searches depending on the optimum 
gain for the pitch. The evaluation is made at five levels 
from level 1 to level 5. There were 24 listeners. 

For purpose of comparison, 24 kbit/s ADPCM, 8 
kbit/s G.729 and 6.3 kbit/s G. 723.1 are used as differ- 
ent coding methods. G. 723.1 uses a long frame length 
of 30 ms and performs a coding through a look-ahead of 
7.5 ms. The present 6.4 kbit/s coding method uses a 
frame length of 10 ms and a look-ahead of 5 ms. 
Results are shown in Fig. 16. 

It will be seen that the method according to the 
invention achieves a quality which is equivalent to 
G. 723.1 as referenced to an input speech level (-26 dB) 
even though the number of pulses representing a ran- 
dom component vector is reduced to three or less and a 
bit allocation for coding is greatly reduced. An equiva- 
lent quality is also achieved when there is a level varia- 
tion (-16 dB, -36 dB). As judged from a result for a 
random bit error of 0.1%, it is seen that no significant 
degradation is recognized if the pitch parity is omitted. 
From a result of switching between 6.4 kbit/s and 8 
kbit/s every 10 ms interval, it is seen that a degradation 
caused by the switching is reduced. 

EFFECTS OF THE INVENTION 

As described, in accordance with the invention, by 
reducing the number of pulses which represent a first 
and a second sub-vector of each of random component 
vectors, comprising a random codebook, to three or 
less, it is possible to reduce the number of bits allocated 
for coding without causing a significant degradation in 
the speech quality. By combining the method of inven- 
tion with a reduction of allocated bits through a modifi- 
cation of coding module and table for other parameters 
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of G.729 (8 kbit/s), the 6.4 kbit/s coding can be realized, 
allowing either bit rate to be selected depending on the 
capacity of the channel or applications. In this manner, 
a communication is enabled, even when a sufficient 
transmission capacity is not secured. In addition, by 
realizing a coding while using a module which is com- 
mon with G.729, the bit rate can be made selectable as 
required while suppressing an augmentation of the 
memory capacity or the like. 

Claims 

1. A speech coding method according to ACELP in 
which an LSP coefficient, a pitch component vector, 
a random component vector, and gain vectors 
which are applied to the pitch component vector 
and the random component vector are coded using 
an LSP codebook, an adaptive codebook, a ran- 
dom codebook and a gain codebook, respectively, 
such that a distortion relative to an input speech 
waveform vector is minimized for each frame; 

comprising the step of coding the random 
component vector such that each of random com- 
ponent vectors forming together the random code- 
book is formed of three or less pulses having a unit 
amplitude for each of a pair of subframes which 
form together a frame, the positions of the pulses 
being determined from a plurality of predetermined 
positions which a pulse can assume in a subframe 
so that a distortion in a synthesized speech is mini- 
mized. 

2. A speech coding method according to Claim 1 in 
which the random codebook comprises a random 
codebook including a shape codebook formed by 
random signals or exciting vectors which define 
pulse positions and which are produced by a learn- 
ing process, and the polarities of the pulses. 

3. A speech coding method according to Claim 1 in 
which the random codebook comprises a conju- 
gate-structure random codebook which is repre- 
sented in terms of a pair of sub-vectors. 

4. A speech coding method according to Claim 1 in 
which each random component vector in the ran- 
dom codebook comprises a pair of sub-vectors, 
each sub-vector comprising a pair of pulses having 
a unit amplitude. 

5. A speech coding method according to Claim 1 in 
which each random component vector in the ran- 
dom codebook comprises a pair of sub-vectors, 
each sub-vector comprising three pulses having a 
unit amplitude. 

6. A speech coding method according to Claim 1 in 
which each random component vector in the ran- 
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dom codebook comprises two sub-vectors, which 
may be selectively defined by two or three pulses 
each having a unit amplitude. 

s 7. A speech coding method according to Claim 1 in 
which a search for the random component vector 
using the random codebook takes place by a 
search in which the random component vector is 
made orthogonalized with respect to the pitch com- 

io ponent vector when an optimum pitch gain has a 
value which exceeds a predetermined value, and 
takes place by a search without orthogonalization 
when the pitch gain does not exceed the predeter- 
mined value. 

15 

8. A speech coding method according to one of 
Claims 2 to 6 in which a bit allocation for only the 
random codebook is reduced to implement 6.4 
kbit/s speech coding. 

20 

9. A speech coding method according to Claim 1 in 
which the gain codebook comprises a 6 bit vector 
quantized gain codebook. 

25 10. A speech coding method according to Claim 1 in 
which the gain codebook comprises a (3+3) bit con- 
jugate-structure gain codebook. 

11. A speech coding method according to claim 9 or 10, 
30 wherein said gain codebook is created by the learn- 
ing using a transmission bit error rate which is 
smaller than that employed in creation of a code- 
book by the learning according to said G.729. 

35 12. A speech coding method according to claim 11, 
wherein said transmission bit error rate used in the 
creation of said gain codebook is smaller than 
0.5%. 

40 13. A speech coding method according to Claim 1 in 
which bits are allocated to the code of pitch compo- 
nent vector without a parity bit. 

14. A speech coding method according to Claim 1 or 
45 13, in which the LSP coding comprises the steps of 

coding in a first stage using a first LSP codebook, 
and coding in a second stage using a second LSP 
codebook, the number of bits in the second LSP 
codebook being less than the number of bits in the 
so second LSP codebook according to G. 729 which is 
equal to 10. 

15. A speech coding method according to Claim 14 in 
which the second LSP codebook comprises a part 

55 of the second LSP codebook according to the 
G.729. 

16. A speech coding method according to Claim 14 in 
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which the second LSP codebook comprises an 
LSP codebook which is prepared anew by a learn- 
ing process. 

17. A speech coding method according to one of Claim 
14,15 and 1 6 in which each vector forming the sec- 
ond LSP codebook has a number of bits in either a 
lower order or a higher order or in both which is less 
than five bits. 

18. A speech coding method according to Claim 17 in 
which the random codebook comprises a random 
codebook formed by a shape codebook formed by 
random signals or exciting vectors which define 
pulse positions and which are prepared by a learn- 
ing process, and the polarities of the pulses, thus 
enabling 6.4 kbit/s speech coding. 

19. A speech coding method according to Claim 17 in 
which the random codebook comprises a conju- 
gate-structure random codebook which is repre- 
sented in terms of a pair of sub-vectors, thereby 
enabling a 6.4 kbit/s speech coding. 

20. A speech coding method according to Claim 17 in 
which each random component vector in the ran- 
dom codebook comprises a pair of sub-vectors, 
each sub-vector comprising a pair of pulses having 
a unit amplitude, thereby enabling a 6.4 kbit/s 
speech coding. 

21. A speech coding method according to Claim 17 in 
which each random component vector in the ran- 
dom codebook comprises a pair of sub-vectors, 
each sub-vector comprising three pulses having a 
unit amplitude, thereby enabling a 6.4 kbit/s speech 
coding. 

22. A voice coding method according to Claim 17 in 
which each random component vector in the ran- 
dom codebook comprises two sub-vectors, which 
may be selectively formed by two or three pulses 
each having a unit amplitude, thereby enabling a 
6.4 kbit/s speech coding. 

23. A speech coding method according to Claim 17 in 
which the gain codebook comprises a (3+3) bit con- 
jugate-structure gain codebook. 

24. A speech coding method according to Claim 17 in 
which a search for the random component vector 
using the random codebook takes place by an 
orthogonal ized search in which the random compo- 
nent vector is orthogonalized with respect to the 
pitch component vector when an optimum pitch 
gain has a value which exceeds a predetermined 
value, and takes place by a search without an 
orthogonalization when the pitch gain does not 
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G.729. When coding a random component vector, each 
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FIG.1 



form together a frame. The positions of the pulses are 
determined from a plurality of predetermined positions 
which a pulse can assume in a subframe so that a dis- 
tortion in minimized. 
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